Regulon Enrichment Analysis using GSEA
#Written by Hanhae Kim. Contact kimhanhae@gmail.com for more information

1. A Regulon
We define that A Regulon is a gene consisting of more than 15 neighbors (linkages) in functional gene network; Need discussion with Dr. Insuk Lee in order to adjust the size of Regulon. Normally thousands of regulons are generated under the 15 neighbors cut off.
-If you use any other USUAL gene sets, its Gene Set analysis. So,  Regulons are gene sets in Regulon Enrichment Analysis (REA)

2. Make file format
To use Gene Set Enrichment Analysis (GSEA, www.broadinstitute.org/gsea ), you need three kind of data files. 
Easy to say, 
1) GCT file; Expression data 2) GMT file; Gene Sets- in thise case, Regulons 3) CLS file; description about GCT file, such as conditions, number of arrays.

For more details, click following link and make file format
http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats

3. Run GSEA 
> Important: By permutation chances, output results are varied. Its actually Gene Set Analysis property.  Therefore, run GSEA several times and find optimal results. You would get top Regulon ranks are consistent during several trials though.
GSEA is served two types of version. One is R version, another is GUI version. Both use same file format described 2. Make file format.  You may know input parameters in order to use R version. Im not friendly with those parameters of R version, so I recommend using GUI version. One bothering thing to use GUI version is that you have to get JAVA software on your machine. 
Parameters on GSEA are important. For example, running GSEA needs permutation. Theres two ways to permute. One is Phenotype (array samples) and another is Gene Set (here, we use Regulons). The default parameter for permutation of R version is Phenotype. Therefore, if you dont have enough array samples on your expression data, its not running on R version as long as you change Gene Set permutation method. Im sure theres way to modify parameters. Im not clear with R version though.
1) R version ? command line version
You can download R version of GSEA at Orion.; netbio/R.GSEA/ 
Theres an example, GDS1012. You can see a sub directory, /GDS1012.C0085786/, which contains output example files after running R version of GSEA. 
 Its welcome to test R version of GSEA with this sample. Before you test, please carefully read README to run. (Dr. Sohyun Hwang made the README. Let's Thanks to her) You can also confirm and/or see how file formats are, by opening GDS1012.C0085786.gct, DOLite.gmt, GDS1012.C0085786.cls in the sub directory, /GDS1012.C0085786/.
  
2) GUI version ? Graphic User Interface
GUI version serves very intuitive interface. Therefore, you can easily check and modify many parameters.  You can run GSEA though youve not enough array samples by Gese Set (Regulon) permutation.  I personally prefer to use Gene Set permutation rather than phenotype permutation because we could get thousands of Regulons from our functional gene network. It may be more robust than phenotype permutation.  (But IM NOT SURE~~~)

To make sure with your project, read this paper about permutation method. (Thanks to Dr. Sohyun Hwang)
http://www.ncbi.nlm.nih.gov/pubmed/?term=discovering%20statistically%20significant%20pathways%20in%20expression%20profiling%20studies

>How to run
(1) Load data: load gct, gmt, cls file. 
(2) Run GSEA: 
   - Required files 
	Expression dataset-input gct file will show up.
	Gene Sets database-gmt file;  click Gene matrix (local gmx/gmt) to show your input gmt file up.
	Number of permutations-fixed, 1000
	Phenotype labels-Phenotype labels will show up according to your cls file information.
	Collapse dataset to gene symbols-default true; some has to change false
==> GSEA support major organism array chip platform. In this case you can use this TURE, selecting chip platform on Chip platform(s) option. However, if your target array platform is not supported, choose false to be not in trouble running.
Permutation type-default phenotype; you can choose gene_set instead.
Chip platform(s)-if the parameter of 5th option, Collapse dataset to gene symbols is true, you have to select which chip platform on which your target array sample was conducted.

- Basic files and Advanced files; you can also modify or adjust parameter depending on your input data types and project.
(3) click Run: You can run multiple times in optimizing different options as much as your CPU supports your work load. 


